7,781 research outputs found

    PILER-CR: Fast and accurate identification of CRISPR repeats

    Get PDF
    BACKGROUND: Sequencing of prokaryotic genomes has recently revealed the presence of CRISPR elements: short, highly conserved repeats separated by unique sequences of similar length. The distinctive sequence signature of CRISPR repeats can be found using general-purpose repeat- or pattern-finding software tools. However, the output of such tools is not always ideal for studying these repeats, and significant effort is sometimes needed to build additional tools and perform manual analysis of the output. RESULTS: We present PILER-CR, a program specifically designed for the identification and analysis of CRISPR repeats. The program executes rapidly, completing a 5 Mb genome in around 5 seconds on a current desktop computer. We validate the algorithm by manual curation and by comparison with published surveys of these repeats, finding that PILER-CR has both high sensitivity and high specificity. We also present a catalogue of putative CRISPR repeats identified in a comprehensive analysis of 346 prokaryotic genomes. CONCLUSION: PILER-CR is a useful tool for rapid identification and classification of CRISPR repeats. The software is donated to the public domain. Source code and a Linux binary are freely available at

    Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution

    Full text link
    The standard approach to analyzing 16S tag sequence data, which relies on clustering reads by sequence similarity into Operational Taxonomic Units (OTUs), underexploits the accuracy of modern sequencing technology. We present a clustering-free approach to multi-sample Illumina datasets that can identify independent bacterial subpopulations regardless of the similarity of their 16S tag sequences. Using published data from a longitudinal time-series study of human tongue microbiota, we are able to resolve within standard 97% similarity OTUs up to 20 distinct subpopulations, all ecologically distinct but with 16S tags differing by as little as 1 nucleotide (99.2% similarity). A comparative analysis of oral communities of two cohabiting individuals reveals that most such subpopulations are shared between the two communities at 100% sequence identity, and that dynamical similarity between subpopulations in one host is strongly predictive of dynamical similarity between the same subpopulations in the other host. Our method can also be applied to samples collected in cross-sectional studies and can be used with the 454 sequencing platform. We discuss how the sub-OTU resolution of our approach can provide new insight into factors shaping community assembly.Comment: Updated to match the published version. 12 pages, 5 figures + supplement. Significantly revised for clarity, references added, results not change

    Expedited batch processing and analysis of transposon insertions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With advances in sequencing technology, greater and greater amounts of eukaryotic genome data are becoming available. Often, large portions of these genomes consist of transposable elements, frequently accounting for 50% or more in vertebrates. Each transposable element family may have thousands or tens of thousands of individual copies within a given genome, and therefore it can take an exorbitant amount of time and effort to process data in a meaningful fashion.</p> <p>Findings</p> <p>In order to combat this problem, we developed a set of bioinformatics techniques and programs to streamline the analysis. This includes a unique Perl script which automates the process of taking BLAST, Repeatmasker and similar data to extract and manipulate the hit sequences from the genome. This script, called Process_hits uses an object-oriented methodology to compile all hit locations from a given file for processing, organize this data into useable categories, and output it in multiple formats.</p> <p>Conclusions</p> <p>The program proved capable of handling large amounts of transposon data in an efficient fashion. It is equipped with a number of useful sub-functions, each of which is contained within its own sub-module to allow for greater expandability and as a foundation for future program design.</p

    Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words

    Get PDF
    Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently

    Motif Minang Kaluak Paku Kacang Balimbiang pada Busana Kasual

    Get PDF
    Minangkabau sebagai salah satu suku bangsa yang mengisi kekhasan budaya Indonesia memiliki warisan budaya yang terpencar dalam berbagai aspek kehidupannya. Salah satu warisan budaya adalah seni ukir. Seni ukir yang dikembangkan dengan mengambil ide dari alam memiliki makna-makna filosofi bagi kehidupan masyarakat Minangkabau. Semua jenis ukiran yang dipahatkan di Rumah Gadang menunjukkan unsur penting pembentuk budaya Minangkabau bercerminkan kepada apa yang ada di alam. Salah satu ukiran pada rumah gadang yaitu kaluak paku. Kaluak paku adalah nama salah satu motif ukiran dalam adat Minangkabau. Berasal dari motif gulungan (kelukan/kaluak) pada ujung tanaman pakis (paku) yang masih muda. Ukiran kaluak paku rumah gadang melambangkan tanggung jawab seorang lelaki dalam adat Minangkabau kepada generasi penerus, sebagai ayah dari anak-anaknya dan sebagai mamak dari kemenakan (keponakan). Ukiran rumah gadang kaluak paku minangkabau inilah yang menjadi sumber ide penciptaan busana pada tugas akhir ini. Pada Penciptaan karya ini menggunakan beberapa metode, yaitu metode pendekatan estetis dan ergonomis, metode pengumpulan data dengan studi pustaka, dan motode penciptaan dengan teori Gustami Sp 3 tahap 6 Langkah. Dalam proses pembuatan karya dibutuhkan beberapa data, cara pengumpulan data acuan berdasarkan pengumpulan data pustaka yaitu berupa buku, jurnal pada media sosial, serta aplikasi pada smartphone seperti pinterest. Data yang dikumpulkan yang paling utama adalah gambar bentuk visual dari ukiran tanaman kaluak paku minangkabau dan busana kasual. Penciptaan karya yang dihasilkan yaitu berupa 8 busana kasual. Siluet pada kesuluruhan hasil karya yaitu memiliki siluet A yang mengembang pada bagian bawah. Pada penciptaan karya ini menggunakan bahan utama primisima. Perpaduan warna yang diterapkan menggunakan warna khas minangkabau yang diambil dari warna bendera adatnya “marawa” yaitu merah, hitam, dan kuning. Karya- karya yang dihasilkan dengan penggunaan warna tersebut sangat sesuai dengan tema yang mengangkat ukiran rumah gadang kaluak paku minangkabau. Kata Kunci : Minang, Kaluak Paku Kacang Balimbiang, Kasua

    Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

    Get PDF
    Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

    Risk of Cerebrovascular Events in 178 962 Five-Year Survivors of Cancer Diagnosed at 15 to 39 Years of Age: The TYACSS (Teenage and Young Adult Cancer Survivor Study)

    Get PDF
    Background: Survivors of teenage and young adult (TYA) cancer are at risk of cerebrovascular events, but the magnitude of and extent to which this risk varies by cancer type, decade of diagnosis, age at diagnosis and attained age remains uncertain. This is the largest ever cohort study to evaluate the risks of hospitalisation for a cerebrovascular event among long-term survivors of TYA cancer. Methods:The population-based Teenage and Young Adult Cancer Survivor Study (N=178,962) was linked to Hospital Episode Statistics data for England to investigate the risks of hospitalisation for a cerebrovascular event among 5-year survivors of cancer diagnosed when aged 15-39 years. Observed numbers of first hospitalisations for cerebrovascular events were compared to that expected from the general population using standardised hospitalisation ratios (SHR) and absolute excess risks (AER) per 10,000 person-years. Cumulative incidence was calculated with death considered a competing risk. Results: Overall, 2,782 cancer survivors were hospitalised for a cerebrovascular event—40% higher than expected (SHR=1.4, 95% confidence interval [CI]=1.3-1.4). Survivors of central nervous system (CNS) tumours (SHR=4.6, CI=4.3-5.0), head & neck tumours (SHR=2.6, CI=2.2-3.1) and leukaemia (SHR=2.5, CI=1.9-3.1) were at greatest risk. Males had a significantly higher AER than females (AER=7 versus 3), especially among head & neck tumour survivors (AER=30 versus 11). By age 60, 9%, 6% and 5% of CNS tumour, head & neck tumour, and leukaemia survivors, respectively, had been hospitalised for a cerebrovascular event. Beyond age 60, every year 0.4% of CNS tumour survivors were hospitalised for a cerebral infarction (versus 0.1% expected. Whereas at any age, every year 0.2% of head & neck tumour survivors were hospitalised for a cerebral infarction 7 (versus 0.06% expected). Conclusions: Survivors of a CNS tumour, head & neck tumour, and leukaemia are particularly at risk of hospitalisation for a cerebrovascular event. The excess risk of cerebral infarction among CNS tumour survivors increases with attained age. For head & neck tumour survivors this excess risk remains high across all ages. These groups of survivors, and in particular males, should be considered for surveillance of cerebrovascular risk factors and potential pharmacological interventions for cerebral infarction prevention

    Optimizing substitution matrix choice and gap parameters for sequence alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments.</p> <p>Results</p> <p>POP is compared to a recent method due to Kim and Kececioglu and found to achieve from 0.2% to 1.3% higher accuracies on pair-wise benchmarks extracted from BALIBASE. The VTML matrix series is shown to be the most accurate on several global pair-wise alignment benchmarks, with VTML200 giving best or close to the best performance in all tests. BLOSUM matrices are found to be slightly inferior, even with the marginal improvements in the bug-fixed RBLOSUM series. The PAM series is significantly worse, giving accuracies typically 2% less than VTML. Integer rounding is found to cause slight degradations in accuracy. No evidence is found that selecting a matrix based on sequence divergence improves accuracy, suggesting that the use of this heuristic in CLUSTALW may be ineffective. Using VTML200 is found to improve the accuracy of CLUSTALW by 8% on BALIBASE and 5% on PREFAB.</p> <p>Conclusion</p> <p>The hypothesis that more accurate alignments of distantly related sequences may be achieved using low-identity matrices is shown to be false for commonly used matrix types. Source code and test data is freely available from the author's web site at <url>http://www.drive5.com/pop</url>.</p
    corecore